Skip to content

[Relax][TOPI] Add relax.vision.multibox_transform_loc for SSD/TFLite box decode#18942

Merged
tlopex merged 4 commits intoapache:mainfrom
Dayuxiaoshui:main
Mar 28, 2026
Merged

[Relax][TOPI] Add relax.vision.multibox_transform_loc for SSD/TFLite box decode#18942
tlopex merged 4 commits intoapache:mainfrom
Dayuxiaoshui:main

Conversation

@Dayuxiaoshui
Copy link
Copy Markdown
Contributor

Introduce relax.vision.multibox_transform_loc with MultiboxTransformLocAttrs: decode center-size offsets against ltrb priors, softmax on class logits, and optional clip, threshold masking, and background score zeroing. Register the C++ op with FInferStructInfo checks for shapes and dtypes (including batch and 4*N consistency). Legalize to topi.vision.multibox_transform_loc.

Add tests for struct inference, invalid inputs, Legalize+e2e on LLVM, attribute branches, and TVMScript roundtrip. Add a standalone numpy reference under topi/testing (not exported from tvm.topi.testing to avoid pulling scipy).

Update TFLite frontend NotImplementedError text for DETECTION_POSTPROCESS and NON_MAX_SUPPRESSION_V5 to note multibox is available and link tracking issue #18928.

…box decode

Introduce relax.vision.multibox_transform_loc with MultiboxTransformLocAttrs: decode center-size offsets against ltrb priors, softmax on class logits, and optional clip, threshold masking, and background score zeroing. Register the C++ op with FInferStructInfo checks for shapes and dtypes (including batch and 4*N consistency). Legalize to topi.vision.multibox_transform_loc.

Add tests for struct inference, invalid inputs, Legalize+e2e on LLVM, attribute branches, and TVMScript roundtrip. Add a standalone numpy reference under topi/testing (not exported from tvm.topi.testing to avoid pulling scipy).

Update TFLite frontend NotImplementedError text for DETECTION_POSTPROCESS and NON_MAX_SUPPRESSION_V5 to note multibox is available and link tracking issue apache#18928.
@Dayuxiaoshui
Copy link
Copy Markdown
Contributor Author

cc @tlopex

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the multibox_transform_loc operator to TVM Relax, enabling SSD and TFLite-style box decoding. The implementation includes the Relax operator definition, C++ attributes, struct info inference, and a TOPI implementation using TE. Additionally, a NumPy reference implementation and comprehensive unit tests for correctness and TVMScript parsing are provided. Review feedback suggests simplifying nested conditionals in the TOPI implementation for better readability and removing redundant type casts in the reference implementation.

Copy link
Copy Markdown
Member

@tlopex tlopex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some issues need to be improved

  • There are still a couple of lines over the 100-column limit in multibox_transform_loc.cc. Also, the UTF-8 in the argument description should probably be replaced with -> or to.
  • topi/testing/multibox_transform_loc_python.py looks unused right now — the tests already carry an identical inline copy. Either reuse the helper from topi/testing or remove the dead file.
  • test_multibox_transform_loc_wrong_shape_relation does not currently hit the cross-shape validation, since loc_pred.shape[1] == 19 already fails the divisibility check. Please add a case like 24 to cover the intended path.
  • The LLVM E2E tests should be marked with @tvm.testing.requires_llvm so they skip cleanly on shards without LLVM.
  • All current E2E tests use variances = (1.0, 1.0, 1.0, 1.0), so the variance scaling logic is effectively a no-op in test coverage. It would be good to add a case with non-unity variances (e.g. (0.1, 0.1, 0.2, 0.2)) to exercise that path end-to-end.

@Dayuxiaoshui
Copy link
Copy Markdown
Contributor Author

cc @tlopex

- multibox_transform_loc.cc: wrap long lines, ASCII loc_pred doc (no Unicode arrow).
- vision.h: wrap MultiboxTransformLoc clip reflection line to 100 cols.
- Remove unused topi/testing/multibox_transform_loc_python.py; keep numpy ref in tests.
- test_op_vision: loc_dim=24 vs 4*N infer case; @tvm.testing.requires_llvm on LLVM e2e
  (multibox + all_class_nms); add e2e with non-unity variances.
Copy link
Copy Markdown
Member

@tlopex tlopex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM,
A few minor follow-ups:

  • Please consider documenting that shape cross-checking is partial when cls_shape is unknown.
  • Please consider adding a short note on the expected variances range, since very large values could overflow te.exp(...).

@Dayuxiaoshui
Copy link
Copy Markdown
Contributor Author

@tlopex Thanks for the follow-ups — addressed in the latest commit. We documented that when cls_pred has unknown shape, FInferStructInfo only returns generic rank-3 outputs and skips N-based cross-checks (e.g. loc_pred.shape[1] == 4*N and anchor.shape[1] == N) in a Doxygen note on InferStructInfoMultiboxTransformLoc, the op .describe(...) string, and the Python op docstring Notes. We also added a short warning that very large w/h variances can overflow exp in the half-height/width decode path, in MultiboxTransformLocAttrs reflection text, the Python variances parameter docs, and the same op description.

@tlopex
Copy link
Copy Markdown
Member

tlopex commented Mar 28, 2026

Thank you! LGTM

@tlopex tlopex merged commit 3eb86f7 into apache:main Mar 28, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants